Slope Centering : Making Shortcut Weights Effective ∗
نویسنده
چکیده
Shortcut connections are a popular architectural feature of multi-layer perceptrons. It is generally assumed that by implementing a linear submapping, shortcuts assist the learning process in the remainder of the network. Here we find that this is not always the case: shortcut weights may also act as distractors that slow down convergence and can lead to inferior solutions. This problem can be addressed with slope centering, a particular form of gradient factor centering [2]. By removing the linear component of the error signal at a hidden node, slope centering effectively decouples that node from the shortcuts that bypass it. This eliminates the possibility of destructive interference from shortcut weights, and thus ensures that the benefits of shortcut connections are fully realized.
منابع مشابه
Slope Centering: Making Shortcut Weights Eeective
Shortcut connections are a popular architectural feature of multi-layer perceptrons. It is generally assumed that by implementing a linear sub-mapping, shortcuts assist the learning process in the remainder of the network. Here we nd that this is not always the case: shortcut weights may also act as distractors that slow down convergence and can lead to inferior solutions. This problem can be a...
متن کاملOn Centering Neural Network Weight Updates ?
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals (Schraudolph and Sejnowski, 1996). Here we generalize this notion to all factors involved in the weight update, leading us to propose centering the slope of hidden unit activatio...
متن کاملIDSIA - 19 - 97 April 19 , 1997 revised August 21 , 1998 Centering Neural Network Gradient Factors ?
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [2]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slop...
متن کاملCentering Neural Network Gradient Factors
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals 2]. Here we generalize this notion to all factors involved in the network's gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope...
متن کاملAccelerated Gradient Descent by Factor-Centering Decomposition
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network’s gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simpli...
متن کامل